Research Interests João Sedoc Description of Work
نویسنده
چکیده
Presently my main research interest is the development and application of machine learning and statistical techniques toward natural language processing. The representation of words using vector space models is widely used for a variety of natural language processing (NLP) tasks. The two main word embedding categories are cluster based and dense representations. Brown Clustering and other hierarchical clustering methods group similar words based on context. Dense representations such as Latent Semantic Analysis / Latent Semantic Indexing (LSA/LSI), Low Rank Multi-View Learning (LR-MVL) and Word2Vec have shown state-ofthe-art results in syntactic and sentiment tasks. Since current embeddings are focused on word level rather than phrase, they do not capture both fine-grained and coarse context. Therefore encoding meaning at the word, phrase and sentence level is an area of intense research focus. Finally distributional similarity in context does not imply semantic similarity, we find that often antonyms are distributionally similar.
منابع مشابه
Seating Assignment Using Constrained Signed Spectral Clustering
In this paper, we present a novel method for constrained cluster size signed spectral clustering (CSS) which allows us to subdivide large groups of people based on their relationships. In general, signed clustering only requires K hard clusters and does not constrain the cluster sizes. We extend signed clustering to include cluster size constraints. Using an example of seating assignment, we ef...
متن کاملSemantic Word Clusters Using Signed Normalized Graph Cuts
Vector space representations of words capture many aspects of word similarity, but such methods tend to make vector spaces in which antonyms (as well as synonyms) are close to each other. We present a new signed spectral normalized graph cut algorithm, signed clustering, that overlays existing thesauri upon distributionally derived vector representations of words, so that antonym relationships ...
متن کاملPhosphorus sorption by sediments in a southeastern coastal plain in-stream wetland.
A close relationship has been reported between sediment organic C (SedOC) content and its P sorption capacity (P(max)) and total P (TP) concentration. Phosphorus sorbed to organically complexed cations is a proposed explanation for this relationship. The objectives of this study were (i) to determine relationships between in-stream wetland SedOC content and both the sediment's P(max) and TP con...
متن کاملDomain Aware Neural Dialog System
We investigate the task of building a domain aware chat system which generates intelligent responses in a conversation comprising of different domains. The domain in this case is the topic or theme of the conversation. To achieve this, we present DOM-Seq2Seq, a domain aware neural network model based on the novel technique of using domain-targeted sequence-to-sequence models (Sutskever et al., ...
متن کاملEnterprise to Computer: Star Trek chatbot
Human interactions and human-computer interactions are strongly influenced by style as well as content. Adding a persona to a chatbot makes it more human-like and contributes to a better and more engaging user experience. In this work, we propose a design for a chatbot that captures the style of Star Trek by incorporating references from the show along with peculiar tones of the fictional chara...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015